A Connectome and Analysis of the Adult Drosophila Central Brain
Evaluation summary
This is a landmark paper and a tour-de-force that ties together decades of advances in electron microscopy to produce a dataset of both breadth and extreme technical quality whose very existence will have profound and lasting influence on neuroscience. The manuscript is extensive and well-illustrated, and the data, methods and analyses are made available to the community in an exemplary manner. The work represents ambitious, large-scale biological resource generation at its apotheosis.
Abstract
The neural circuits responsible for animal behavior remain largely unknown. We summarize new methods and present the circuitry of a large fraction of the brain of the fruit fly Drosophila melanogaster. Improved methods include new procedures to prepare, image, align, segment, find synapses in, and proofread such large data sets. We define cell types, refine computational compartments, and provide an exhaustive atlas of cell examples and types, many of them novel. We provide detailed circuits consisting of neurons and their chemical synapses for most of the central brain. We make the data public and simplify access, reducing the effort needed to answer circuit questions, and provide procedures linking the neurons defined by our analysis with genetic reagents. Biologically, we examine distributions of connection strengths, neural motifs on different scales, electrical consequences of compartmentalization, and evidence that maximizing packing density is an important criterion in the evolution of the fly’s brain.
Introduction
The connectome we present is a dense reconstruction of a portion of the central brain (referred to here as the hemibrain) of the fruit fly, Drosophila melanogaster, as shown in Figure 1. This region was chosen since it contains all the circuits of the central brain (assuming bilateral symmetry), and in particular contains circuits critical to unlocking mysteries involving associative learning in the mushroom body, navigation and sleep in the central complex, and circadian rhythms among clock circuits. The largest dense reconstruction to date, it contains around 25,000 neurons, most of which were rigorously clustered and named, with about 20 · 106 chemical synapses between them, plus portions of many other neurons truncated by the boundary of the data set (details in Figure 1 below). Each neuron is documented at many levels - the detailed voxels that constitute it, a skeleton with segment diameters, its synaptic partners and the location of most of their synapses.
The hemibrain and some basic statistics. The highlighted area shows the portion of the central brain that was imaged and reconstructed, superimposed on a grayscale representation of the entire Drosophila brain. For the table, a neuron is traced if all its main branches within the volume are reconstructed. A neuron is considered uncropped if most arbors (though perhaps not the soma) are contained in the volume. Others are considered cropped. Note: 1) our definition of cropped is somewhat subjective; 2) the usefulness of a cropped neuron depends on the application; and 3) some small fragments are known to be distinct neurons. For simplicity, we will often state that the hemibrain contains ≈ 25K neurons.
Producing this data set required advances in sample preparation, imaging, image alignment, machine segmentation of cells, synapse detection, data storage, proofreading software, and protocols to arbitrate each decision. A number of new tests for estimating the completeness and accuracy were required and therefore developed, in order to verify the correctness of the connectome.
These data describe whole-brain properties and circuits, as well as contain new methods to classify cell types based on connectivity. Computational compartments are now more carefully defined, we identify actual synaptic circuits, and each neuron is annotated by name and putative cell type, making this the first complete census of neuropils, tracts, cells, and connections in this portion of the brain. We compare the statistics and structure of different brain regions, and for the brain as a whole, without the confounds introduced by studying different circuitry in different animals.
All data are publicly available through web interfaces. This includes a browser interface, NeuPrint(Clements et al., 2020), designed so that any interested user can query the hemibrain connectome even without specific training. NeuPrint can query the connectivity, partners, connection strengths and morphologies of all specified neurons, thus making identification of upstream and downstream partners orders of magnitude easier than through existing genetic methods. In addition, for those who are willing to program, the full data set - the gray scale voxels, the segmentation and proofreading results, skeletons, and graph model of connectivity, are also available through publicly accessible application program interfaces (APIs).
This effort differs from previous EM reconstructions in its social and collaborative aspects. Previous reconstructions were either dense in much smaller EM volumes(such as (Meinertzhagen and O’neil, 1991)(Helmstaedter et al., 2013)(Takemura et al., 2017)) or sparse in larger volumes (such as (Eichler et al., 2017) or (Zheng et al., 2018)). All have concentrated on the reconstruction of specific circuits to answer specific questions. When the same EM volume is used for many such efforts, as has occurred in the Drosophila larva and the full adult fly brain, this leads to an overall reconstruction that is the union of many individual efforts(Saalfeld et al., 2009). The result is inconsistent coverage of the brain, with some regions well reconstructed and others missing entirely. In contrast, here we have analyzed the entire volume, not just the subsets of interest to specific groups of researchers with the expertise to tackle EM reconstruction. We are making these data available without restriction, with only the requirement to cite the source. This allows the benefits of known circuits and connectivity to accrue to the field as a whole, a much larger audience than those with expertise in EM reconstruction. This is analogous to progress in genomics, which transitioned from individual groups studying subsets of genes, to publicly available genomes that can be queried for information about genes of choice(Altschul et al., 1990).
One major benefit to this effort is to facilitate research into the circuits of the fly’s brain. A common question among researchers, for example, is the identity of upstream and downstream (respectively input and output) partners of specific neurons. Previously this could only be addressed by genetic trans-synpatic labelling, such as trans-Tango(Talay et al., 2017), or by sparse tracing in previously imaged EM volumes(Zheng et al., 2018). However, the genetic methods may give false positives and negatives, and both alternatives require specialized expertise and are time consuming, often taking months of effort. Now, for any circuits contained in our volume, a researcher can obtain the same answers in seconds by querying a publicly available database.
Another major benefit of dense reconstruction is its exhaustive nature. Genetic methods such as stochastic labeling may miss cell types, and counts of cells of a given type are dependent on expression levels, which are always uncertain. Previous dense reconstructions have demonstrated that existing catalogs of cell types are incomplete, even in well-covered regions(Takemura et al., 2017). In our hemibrain sample, we have identified all the cells within the reconstructed volume, thus providing a complete and unbiased census of all cell types in the fly’s central brain (at least in this single female), and a precise count of the instances of each type.
Another scientific benefit lies in an analysis without the uncertainty of pooling data obtained from different animals. The detailed circuitry of the fly’s brain is known to depend on nutritional history, age, and circadian rhythm. Here these factors are held constant, as are the experimental methods, facilitating comparison between different fly brain regions in this single animal. Evaluating stereotypy across animals will of course eventually require additional connectomes.
Previous reconstructions of compartmentalized brains have concentrated on particular regions and circuits. The mammalian retina(Helmstaedter et al., 2013) and cortex(Kasthuri et al., 2015), and insect mushroom bodies(Eichler et al., 2017)(Takemura et al., 2017) and optic lobes(Takemura et al., 2015) have all been popular targets. Additional studies have examined circuits that cross regions, such as those for sensory integration(Ohyama et al., 2015) or motion vision(Shinomiya et al., 2019).
So far lacking are systematic studies of the statistical properties of computational compartments and their connections. Neural circuit motifs have been studied(Song et al., 2005), but only those restricted to small motifs and at most a few cell types, usually in a single portion of the brain. Many of these results are in mammals, leading to questions of whether they also apply to invertebrates, and whether they extend to other regions of the brain. While there have been efforts to build reduced, but still accurate, electrical models of neurons(Marasco et al., 2012), none of these to our knowledge have used the compartments structure of the brain.
What is included
Figure 2 shows the hierarchy of the named brain regions that are included in the hemibrain. Table 1 shows the primary regions that are at least 50% included in the hemibrain sample, their approximate size, and their completion percentage. Our names for brain regions follow the conventions of (Ito et al., 2014) with the addition of ‘(L)’ or ‘(R)’ to indicate whether the region (most of which occur on both sides of the fly) has its cell bodies in the left or right, respectively. The mushroom body(Tanaka et al., 2008)(Aso et al., 2014) and central complex(Wolff et al., 2015) are further divided into finer compartments.
Brain regions contained and defined in the hemibrain, following the naming conventions of (Ito et al., 2014) with the addition of (R) and (L) to specify the side of the soma for that region. Gray italics indicate master regions not explicitly defined in the hemibrain. Region LA is not included in the volume. The regions are hierarchical, with the more indented regions forming subsets of the less indented. The only exceptions are dACA, lACA, and vACA which are considered part of the mushroom body but are not contained in the master region MB.
The supplementary material includes a section on known sensory input, and motor outputs, included in the volume.
Differences from connectomes of vertebrates
Most accounts of neurobiology define the operation of the mammalian nervous system with, at most, only passing reference to invertebrate brains. Fly (or other insect) nervous systems differ from those of vertebrates in several respects(Meinertzhagen, 2016b). Some main differences include:
- Most synapses are polyadic. Each synapse structure comprises a single presynaptic release site and, adjacent to this, several neurites experessing neurotranmitter receptors. An element, T-shaped and typically called a T-bar in flies, marks the site of transmitter release into the cleft between cells. This site typically abuts the neurites of several other cells, where a postsynaptic density (PSD) marks the receptor location.
- Most neurites are neither purely axonic or dendritic, but have both pre- and postsynaptic partners, a feature that may be more prominent in mammalian brains than recognized(Morgan and Lichtman, 2020). Within a single brain region, however, neurites are frequently predominantly dendritic (postsynaptic) or axonic (presynaptic).
- Unlike some synapses in mammals, EM imagery (at least as we have acquired and analyzed it here) fails to reveal obvious information about whether a synapse is excitatory or inhibitory.
- The soma or cell body of each fly neuron resides in a rind (the cell body layer) on the periphery of the brain, mostly disjoint from the main neurites innervating the internal neuropil. As a result, unlike vertebrate neurons, no synapses form directly on the soma. The neuronal process between the soma and the first branch point is the cell body fiber (CBF), which is likewise not involved in the synaptic transmission of information.
- Synapse sizes are much more uniform than those of mammals. Stronger connections are formed by increasing the number of synapses in parallel, not by forming larger synapses, as in vertebrates. In this paper we will refer to the ‘strength’ of a connection as the synapse count, even though we acknowledge that we lack information on the relative activity and strength of the synapses, and thus a true measure of their coupling strength..
- The brain is small, about 250 µm per side, and has roughly the same size as the dendritic arbor of a single pyramidal neuron in the mammalian cortex.
- Axons of fly neurons are not myelinated.
- Some fly neurons relay on graded transmission (as opposed to spiking), without obvious anatomical distinction. Some neurons even switch between graded and spiking operation(Pimentel et al., 2016).
Connectome Reconstruction
Producing a connectome comprising reconstructed neurons and the chemical synapses between them required several steps. The first step, preparing a fly brain and imaging half of its center, produced a dataset consisting of 26 teravoxels of data, each with 8 bits of information. We applied numerous machine learning algorithms and over 50 person-years of proofreading effort over 2 calendar years to extract a variety of more compact and useful representations, such as neuron skeletons, synapse locations, and connectivity graphs. These are both more useful and much smaller than the raw grayscale data. For example, the connectivity could be reasonably summarized by a graph with ≈25,000 nodes and ≈3 million edges. Even when the connections were assigned to different brain regions, such a graph took only 26 MB, still large but roughly a million fold reduction in data size.
Many of the supporting methods for this reconstruction have been recently published. Here we briefly survey each major area, with more details reported in the companion papers. Major advances include:
- New methods to fix and stain the sample, preparing a whole fly brain with well-preserved subcellular detail particularly suitable for machine analysis.
- Methods that have enabled us to collect the largest EM dataset yet using Focused Ion Beam Scanning Electron Microscopy (FIB-SEM), resulting in isotropic data with few artifacts, features that significantly speed up reconstruction.
- A coarse-to-fine, automated flood-filling network segmentation pipeline applied to image data normalized with cycle-consistent generative adversarial networks, and an aggressive automated agglomeration regime enabled by advances in proofreading.
- A new hybrid synapse prediction method, using two differing underlying techniques, for accurate synapse prediction throughout the volume.
- New top-down proofreading methods that utilize visualization and machine learning to achieve orders of magnitude faster reconstruction compared with previous approaches in the fly’s brain.
Each of these is explained in more detail in the following sections and, where necessary, in the Supplemental Methods.
Image stack collection
The first steps, fixing and staining the specimen, have been accomplished taking advantage of three new developments. These improved methods allow us to fix and stain a full fly’s brain but nevertheless recover neurons as round profiles with darkly stained synapses, suitable for machine segmentation and automatic synapse detection. Starting with a five day old female of wild-type Canton S strain G1 x w1118, we used a custom-made jig to microdissect the central nervous system, which was then fixed and embedded in Epon, an epoxy resin. We then enhanced the electron contrast by staining with heavy metals, and progressively lowered the temperature during dehydration of the sample. Collectively these methods optimize morphological preservation, allow full-brain preparation without distortion (unlike fast freezing methods), and provide increased staining intensity that speeds the rate of FIB-SEM imaging(Lu et al., 2019).
The hemibrain sample is roughly 250 x 250 x 250 µm, larger than we can FIB-SEM without introducing milling artifacts. Therefore we subdivided our epoxy-embedded samples into 20 µm thick slabs, both to avoid artifacts and allow imaging in parallel (each slab imaged in a different FIB machine) for increased throughput. To be effective, the cut surfaces of the slabs must be smooth at the ultrastructural level and have only minimal material loss. Specifically, for connectomic research, all long-distance processes must remain traceable across sequential slabs. We used an improved version of our previously published ‘hot-knife’ ultrathick sectioning procedure(Hayworth et al., 2015) which uses a heated, oil-lubricated diamond knife, to section the Drosophila brain into 37 sagittal slabs of 20 µm thickness with an estimated material loss between consecutive slabs of only ∼30 nm - sufficiently small to allow tracing of long-distance neurites. Each slab was re-embedded, mounted, and trimmed, then examined in 3-D with X-ray tomography to check for sample quality and establish a scale factor for Z-axis cutting by FIB. The resulting slabs were FIB-SEM imaged separately (often in parallel, for increased throughput) and the resulting volume datasets were stitched together computationally.
Connectome studies come with clearly defined resolution requirements - the finest neurites must be traceable by humans and should be reliably segmented by automated algorithms(Januszewski et al., 2018). In Drosophila, the very finest neural processes are usually 50 nm but can be as little as 15 nm(Meinertzhagen, 2016a). This fundamental biological dimension determines the minimum isotropic resolution requirements for tracing neural circuits. To meet the demand for high isotropic resolution and large volume imaging, we chose the FIB-SEM imaging platform, which offers high isotropic resolution (< 10 nm in x, y, and z), minimal artifacts, and robust image alignment. The high-resolution and isotropic dataset possible with FIB-SEM has substantially expedited the Drosophila connectome pipeline. Compared to serial-section imaging, with its sectioning artifacts and inferior Z-axis resolution, FIB-SEM offers high quality image alignment, a smaller number of artifacts, and isotropic resolution. This allows higher quality automated segmentation and makes manual proofreading and correction easier and faster.
At the beginning, deficiencies in imaging speed and system reliability of any commercial FIB-SEM system capped the maximum possible image volume to less than 0.01% of a full fly brain, problems that persist even now. To remedy them, we redesigned the entire control system, improved the imaging speed more than 10x, and created innovative solutions addressing all known failure modes, which thereby expanded the practical imaging volume of conventional FIB-SEM by more than four orders of magnitude from 103µm3 to 3 - 107µm3, while maintaining an isotropic resolution of 8 x 8 x 8 nm voxels(Xu et al., 2017)(Xu et al., 2019). In order to overcome the aberration of a large field of view (up to 300 µm wide), we developed a novel tiling approach without sample stage movement, in which the imaging parameters of each tile are individually optimized through an in-line auto focus routine without overhead(Xu et al., 2018). After numerous improvements, we have transformed the conventional FIB-SEM from a laboratory tool that is unreliable for more than a few days of imaging to a robust volume EM platform with effective long-term reliability, able to perform years of continuous imaging without defects in the final image stack. Imaging time, rather than FIB-SEM reliability, is now the main impediment to obtaining even larger volumes.
In our study here, the Drosophila “hemibrain”, thirteen consecutive hot-knifed slabs were imaged using two customized enhanced FIB-SEM systems, in which an FEI Magnum FIB column was mounted at 90° upon a Zeiss Merlin SEM. After data collection, streaking artifacts generated by secondary electrons along the FIB milling direction were computationally removed using a mask in the frequency domain. The image stacks were then aligned using a customized version of the software platform developed for serial section transmission electron microscopy (Zheng et al., 2018)(Khairy et al., 2018), followed by binning along z-axis to form the final 8 x 8 x 8 nm3 voxel datasets. Milling thickness variations in the aligned series were compensated using a modified version of the method described by Hanslovsky et al.(Hanslovsky et al., 2017), with the absolute scale calibrated by reference to the MicroCT images.
The 20 µm slabs generated by the hot-knife sectioning were re-imbedded in larger plastic tabs prior to FIB-SEM imaging. To correct for the warping of the slab that can occur in this process, methods adapted from Kainmueller(Kainmueller et al., 2008) were used to find the tissue-plastic interface and flatten each slab’s image stack.
The series of flattened slabs was then stitched using a custom method for large scale deformable registration to account for deformations introduced during sectioning, imaging, embedding, and alignment (Saalfeld et al. in prep). These volumes were then contrast adjusted using slice-wise contrast limited adaptive histogram equalization (CLAHE)(Pizer et al., 1987), and converted into a versioned database(Distributed, Versioned, Image-oriented Database, or DVID), which formed the raw data for the reconstruction, as illustrated in Figure 3.
The 13 slabs of the hemibrain, each flattened and co-aligned. Colors are arbitrary and added to the monochrome data to define the brain regions, as computed in section 2.5.
Automated Segmentation
Computational reconstruction of the image data was performed using flood-filling networks (FFNs) trained on roughly five-billion voxels of volumetric ground truth contained in two tabs of the hemibrain dataset(Januszewski et al., 2018). Initially, the FFNs generalized poorly to other tabs of the hemibrain, whose image content had a different appearance. Therefore we adjusted the image content to be more uniform using cycle-consistent generative adversarial networks (CycleGANs)(Zhu et al., 2017). Specifically, “generator” networks were trained to alter image content such that a second “discriminator” network was unable to distinguish between image patches sampled from, for example, a tab that contained volumetric training data versus a tab that did not. A cycle-consistency constraint was used to ensure that the image transformations preserved ultrastructural detail. The improvement is illustrated in Figure 4. Overall, this allowed us to use the training data from just two slabs, as opposed to needing training data for each slab.
(a) Original EM data from tab 34 at a resolution of 16 nm / resolution, (b) EM data after CycleGAN processing, (c-d) FFN segmentation results with the 16 nm model applied to original and processed data, respectively. Scale bar in (a) represents 1 µm.
FFNs were applied to the CycleGAN-normalized data in a coarse-to-fine manner at 32×32×32 nm3 and 16×16×16 nm3, and to the CLAHE-normalized data at the native 8×8×8 nm3 resolution, in order to generate a base segmentation that was largely over-segmented. We then agglomerated the base segmentation, also using FFNs. We aggressively agglomerated segments despite introducing substantial numbers of erroneous mergers. This differs from previous algorithms, which studiously avoided merge errors since they were so difficult to fix. Here, advances in proofreading methodology described elsewhere in this report enabled efficient detection and correction of such mergers.
We evaluated the accuracy of the FFN segmentation of the hemibrain using metrics for expected run length (ERL) and merge rate(Januszewski et al., 2018). The base segmentation (i.e., the automated reconstruction prior to agglomeration) achieved an ERL of 163 µm with a merge rate of 0.25%. After (automated) agglomeration, run length increased to 585 µm but with a false merge rate of 27.6% (i.e., nearly 30% of the path length was contained in segments with at least one merge error). We also evaluated a subset of neurons in the volume, ∼500 olfactory PN and KC cells chosen to roughly match the evaluation performed in (Li et al., 2019) which yielded an ERL of 825 µm at a 15.9% merge rate.
Synapse Prediction
Accurate synapse identification is central to our analysis, given that synapses form both a critical component of a connectome and are required for prioritizing and guiding the proofreading effort. Synapses in Drosophila are typically polyadic, with a single presynaptic site (a T-bar) contacted by multiple receiving dendrites (most with PSDs, postsynaptic densities) as shown in Figure 5a. Initial synapse prediction revealed that there are over 9 million T-bars and 60 million PSDs in the hemibrain. Manually validating each one, assuming a rate of 1000 connections annotated per trained person, per day, would have taken more than 230 working years. Given this infeasibility, we developed machine learning approaches to predict synapses as detailed below. The results of our prediction are shown in Fig 5b, where the predicted synapse sites clearly delineate many of the fly brain regions.
Well-preserved membranes, darkly stained synapses, and smooth round neurite profiles are characteristics of the hemibrain sample. Panel (a) shows polyadic synapses, with a red arrow indicating the presynaptic T-bar, and white triangles pointing to the postsynaptic densities. Mitochondria (‘M’), synaptic vesicles (‘SV’), and the scale bar (0.5 µm) are shown. Panel (b) shows a cross section through a point cloud of all detected synapses. This EM point cloud defines many of the compartments in the fly’s brain, much like an optical image obtained using antibody nc82 antibody (against Bruchpilot, a component of T-bars) to stain synapses. This point cloud is used to generate the transformation from our sample to the standard Drosophila brain.
Given the size of the hemibrain image volume, a major challenge from a machine learning perspective is the range of varying image statistics across the volume. In particular, model performance can quickly degrade in regions of the data set with statistics that are not well-captured by the training set(Buhmann et al., 2019). To address this challenge, we took an iterative approach to synapse prediction, interleaving model re-training with manual proofreading, all based on previously reported methods(Huang et al., 2018). Initial prediction, followed by proofreading, revealed a number of false positive predictions from structures such as dense core vesicles which were not well-represented in the original training set. A second filtering network was trained on regions causing such false positives, and used to prune back the original set of predictions. We denote this pruned output as the ‘initial’ set of synapse predictions.
Based on this initial set, we began collecting human-annotated dense ground-truth cubes throughout the various brain regions of the hemibrain, to assess variation in classifier performance by brain region. From these cubes, we determined that although many regions had acceptable precision, there were some regions in which recall was lower than desired. Consequently, a subset of cubes available at that time was used to train a new classifier focused on addressing recall in the problematic regions. This new classifier was used in an incremental (cascaded) fashion, primarily by adding additional predictions to the existing initial set. This gave better performance than complete replacement using only the new classifier, with the resulting predictions able to improve recall while largely maintaining precision.
As an independent check on synapse quality, we also trained a separate classifier(Buhmann et al., 2019), using a modified version of the ‘synful’ software package. Both synapse predictors give a confidence value associated with each synapse, a measure of how firmly the classifier believes the prediction to be a true synapse. We found that we were able to improve recall by taking the union of the two predictor’s most confident synapses, and similarly improve precision by removing synapses that were low confidence in both predictions. Figures 6a and 6b show the results, illustrating the precision and recall obtained in each brain region.
Precision and recall for synapse prediction, on the left for T-bars, and on the right for synapses as a whole including the identification of PSDs. T-bar identification is better than PSD identification since this organelle is both more distinct and typically occurs in larger neurites. Each dot is one brain region. The size of the dot is proportional to the volume of the region. Humans proofreaders typically achieve 0.9 precision/recall on T-bars and 0.8 precision/recall on PSDs, indicated in purple.
Proofreading
Since machine segmentation is not perfect, we made a concerted effort to fix the errors remaining at this stage by several passes of human proofreading. Segmentation errors can be roughly grouped into two classes - “false merges”, in which two separate neurons are mistakenly merged together, and “false splits”, in which a single neuron is mistakenly broken into several segments. Enabled by advances in visualization and semi-automated proofreading using our Neu3 tool(Hubbard et al., 2020), we first addressed large false mergers. A human examined each putative neuron and determined if it had an unusual morphology suggesting that a merge might have occurred, a task still much easier for humans than machines. If judged to be a false merger, the operator identified discrete points that should be on separate neurons. The shape was then resegmented in real time allowing users to explore other potential corrections. Neurons with more complex problems were then scheduled to be re-checked, and the process repeated until few false mergers remained.
In the next phase, the largest remaining pieces were merged into neuron shapes using a combination of machine-suggested edits(Plaza, 2014) and manual intuition, until the main shape of each neuron emerged. This requires relatively few proofreading decisions and has the advantage of producing an almost complete neuron catalog early in the process. As discussed below, in the section on validation, emerging shapes were compared against genetic/optical image libraries (where available) and against other neurons of the same putative type, to guard against large missing or superfluous branches. These procedures (which focused on higher-level proofreading) produced a reasonably accurate library of the main branches of each neuron, and a connectome of the stronger neuronal pathways. At this point there was still considerable variations among the brain regions, with greater completeness achieved in regions where the initial segmentation performed better.
Finally, to achieve the highest reconstruction completeness possible in the time allotted, and to enable confidence in weaker neuronal pathways, proofreaders connected remaining isolated fragments (segments) to already constructed neurons, using NeuTu(Zhao et al., 2018) and Neu3(Hubbard et al., 2020). The fragments that would result in largest connectivity changes were considered first, exploiting automatic guesses through focused proofreading where possible. Since proofreading every small segment is still prohibitive, we tried to ensure a basic level of completeness throughout the brain with special focus in regions of particular biological interest such as the central complex and mushroom body.
Defining brain regions
In a parallel effort to proofreading, the sample was annotated with discrete brain regions. Our progression in mapping the cells and circuits of the fly’s brain bears formal parallels to the history of mapping the earth, with many territories that are named and with known circuits, and others that still lack all or most of these. For the hemibrain dataset the regions are based on the brain atlas in Ito et al(Ito et al., 2014). The dataset covers most of the right hemisphere of the brain, except the optic lobe (OL), periesophageal neuropils (PENP) and gnathal ganglia (GNG), as well as part of the left hemisphere (Table 1). It covers about 36% of all synaptic neuropils by volume, and 54% of the central brain neuropils. We examined innervation patterns, synapse distribution, and connectivity of reconstructed neurons to define the neuropils as well as their boundaries on the dataset. We also made necessary, but relatively minor, revisions to some boundaries by reflecting anatomical features that had not been known during the creation of previous brain maps, while following the existing structural definitions(Ito et al., 2014). We also used information from synapse point clouds, a predicted glial mask, and a predicted fiber bundle mask to determine boundaries of the neuropils (Figure 7 A). The brain regions of the fruit fly (Figure 7, B and C) include synaptic neuropils and non-synaptic fiber bundles. The non-synaptic cell body layer on the brain surface, which contains cell bodies of the neurons and glia, surrounds these structures. The synaptic neuropils can be further categorized into two groups: delineated and diffuse neuropils. The delineated neuropils have distinct boundaries throughout their surfaces, often accompanied by glial processes, and have clear internal structures in many cases. They include the antennal lobe (AL), bulb (BU), as well as the neuropils in the optic lobe (OL), mushroom body (MB), and central complex (CX). Remaining are the diffuse neuropils, sometimes referred to as terra incognita, since most have been less investigated than the delineated neuropils. In the previous brain atlas of 2014, boundaries of many terra incognita neuropils were rather arbitrarily determined, due to a lack of information then of their innervating neurons.
Panel (A) A coronal section of the hemibrain dataset with synapse point clouds (white), predicted glial tissue (green), and predicted fiber bundles (magenta). (B) Grayscale image overlaid with segmented neuropils at the same level as (A). (C) A frontal view of the reconstructed neuropils. Scale bar: (A, B) 50µm.
Diffuse (terra incognita) neuropils
In the hemibrain data, we adjusted the boundaries of some terra incognita neuropils using reconstructed neurons and their synaptic sites. Examples include the lateral horn (LH), ventrolateral neuropils (VLNP), and the boundary between the crepine (CRE) and lateral accessory lobe (LAL). The LH has been defined as the primary projection target of the olfactory projection neurons (PNs) from the antennal lobe (AL) via several antennal lobe tracts (ALTs)(Ito et al., 2014)(Pereanu et al., 2010). The boundary between the LH and its surrounding neuropils is barely visible with synaptic immunolabeling such as nc82 or predicted synapse point clouds, as the synaptic contrast in these regions is minimal. The olfactory PNs can be grouped into several classes, and the projection sites of the uniglomerular PNs that project through the medial ALT (mALT), the thickest fiber bundle between the AL and LH, give the most conservative and concrete boundary of the ‘core’ LH (Figure 8A). Multiglomerular PNs, on the other hand, project to much broader regions, including the volumes around the core LH (Figure 8B). These regions include areas which are currently considered parts of the superior lateral protocerebrum (SLP) and posterior lateral protocerebrum (PLP). Since the “core” LH roughly approximates the shape of the traditional LH, and the boundaries given by the multiglomerular PNs are rather discrete, in this study we assumed the core to be the LH itself. Of course, the multiglomerular PNs convey olfactory information as well, and therefore the neighboring parts of the SLP and PLP to some extent also receive inputs from the antennal lobe. These regions might be functionally distinct from the remaining parts of the SLP or PLP, but they are not explicitly separated from those neuropils in this study.
Reconstructed brain regions and substructures. (A, B) Dorsal views of the olfactory projection neurons (PNs) and the innervated neuropils, AL, CA, and LH. Uniglomerular PNs projecting through the mALT are shown in (A), and multiglomerular PNs are shown in (B). (C, D) Columnar visual projection neurons. Each subtype of cells is colorcoded. LC cells are shown in (C), and LPC, LLPC, and LPLC cells are shown in (D). (E, F) The nine layers of the fan-shaped body (FB), along with the asymmetrical bodies (AB) and the noduli (NO), displayed as an anterior-ventral view (E), and a lateral view (F). In (E), three FB tangential cells (FB1D (blue), FB3A (green), FB7L (purple)) are shown as markers of the corresponding layers (FBl1, FBl3, and FBl7, respectively). (G) Zones in the ellipsoid body (EB) defined by different types of ring neurons. In this horizontal section of the EB, the left side shows the original grayscale data, and the seven ring neuron subtypes are color-coded. The right side displays the seven segmented zones based on the innervation pattern. Scale bar: 20µm.
The VLNP is located in the lateral part of the central brain and receives extensive inputs from the optic lobe through various types of the visual projection neurons (VPNs). Among them, the projection sites of the lobula columnar (LC), lobula plate columnar (LPC), lobula-lobula plate columnar (LLPC), and lobula plate-lobula columnar (LPLC) cells form characteristic glomerular structures, or the optic glomeruli (OG), in the AOTU, PVLP, and PLP(Klapoetke et al., 2017)(Otsuna and Ito, 2006)(Panser et al., 2016)(Wu et al., 2016). We exhaustively identified columnar VPNs and found 23 types of LC, two types of LPC, three types of LLPC, and three types of LPLC cells. The glomeruli of these pathways were used to determine the medial boundary of the PVLP and PLP, following existing definitions(Ito et al., 2014), except for a few LC types which do not form glomerular terminals. The terminals of the reconstructed LC cells and other lobula complex columnar cells (LPC, LLPC, LPLC) are shown in Figures 8C and 8D, respectively.
In the previous paper(Ito et al., 2014), the boundary between the CRE and LAL was defined as the line roughly corresponding to the posterior-ventral surface of the MB lobes, since no other prominent anatomical landmarks were found around this region. In this dataset, we found several glomerular structures surrounding the boundary both in the CRE and LAL. These structures include the gall (GA), rubus (RUB), and round body (ROB). Most of them turned out to be projection targets of several classes of central complex neurons, implying the ventral CRE and dorsal LAL are closely related in their function. We re-determined the boundary so that each of the glomerular structures would not be divided into two, while keeping the overall architecture and definition of the CRE and LAL. The updated boundary passes between the dorsal surface of the GA and the ventral edge of the ROB. Other glomerular structures, including the RUB, are included in the CRE.
Delineated neuropils
Substructures of the delineated neuropils have also been added to the brain region map in the hemibrain. The asymmetrical bodies (AB) were added as the fifth independent neuropil of the CX(Wolff and Rubin, 2018). The AB is a small synaptic volume adjacent to the ventral surface of the fan-shaped body (FB) that has historically been included in FB(Ito et al., 2014). The AB has been described as a fasciculin II (fasII)-positive structure that exhibits left-right structural asymmetry by Pascual et al.(Pascual et al., 2004), who reported that most flies have their AB only in the right hemisphere, while a small proportion (7.6%) of wild type flies have their AB on both sides. In the hemibrain dataset, a pair of ABs is situated on both sides of the midline, but the left AB is notably smaller than the right AB (right: 1,467µm3, left: 452 µm3), still showing an obvious left-right asymmetry. The AB is especially strongly connected to the neighboring neuropil, the FB, by neurons including Delta0A, Delta0B, and Delta0C, while it also houses postsynaptic terminals of the CX output neurons including FQ12a(Wolff and Rubin, 2018). While these anatomical observations imply that the AB is part of the central body (CB), along with the FB and the ellipsoid body (EB), this possibility is neither developmentally nor phylogenetically proven.
The round body (ROB) is also a small round synaptic structure situated on the ventral limit of the crepine (CRE), close to the β lobe of the MB (Lin et al., 2013)(Wolff and Rubin, 2018). It is a glomerulus-like structure and one of the foci of the CX output neurons, including the PFR (protocerebral bridge – fan-shaped body – round body) neurons. It is classified as a substructure of the CRE along with other less-defined glomerular regions in the neuropil, many of which also receive signals from the CX. Among these, the most prominent one is the rubus (RUB). These are two distinct structures; the RUB is embedded completely within the CRE, while the ROB is located on the ventrolateral surface of the CRE. The lateral accessory lobe (LAL), neighboring the CRE, also houses similar glomerular terminals, and the gall (GA) is one of them. While the ROB and GA have relatively clear boundaries separating them from the surrounding regions, they may not qualify as independent neuropils because of their small size and the structural similarities with the glomerulus-like terminals around them. They may be comparable with other glomerular structures such as the AL glomeruli and the optic glomeruli in the lateral protocerebrum, both of which are considered as substructures of the surrounding neuropils.
Substructures of independent neuropils are also defined using neuronal innervations. The five MB lobes on the right hemisphere are further divided into 15 compartments (α 1-3, α’1-3, β1-2, β’1-2, and γ 1-5)(Tanaka et al., 2008)(Aso et al., 2014) by the mushroom body output neurons (MBONs) and dopaminergic neurons (DANs). Our compartment boundaries were defined by approximating the innervation of these neurons. Although the innervating regions of the MBONs and DANs do not perfectly tile the entire lobes, the compartments have been defined to tile the lobes, so every synapse in the lobes belongs to one of the 15 compartments. The FB is subdivided into nine horizontal layers (FBl1-9) (Figure 8E and 8F) as already illustrated(Wolff et al., 2015). They are determined by the pattern of innervation of 480 FB tangential cells, which form nine groups depending on the dorsoventral levels they innervate in the FB. While neurons innervating neighboring layers may overlap slightly, the layer boundaries were drawn so that the coverage of the tangential arbors by each layer was maximized.
The EB is likewise subdivided into zones by the innervating patterns of the EB ring neurons, the most prominent class of neurons innervating the EB. The ring neurons have six subtypes, R1-R6, and each projects to specific zones of the EB. Among them, the regions innervated by R2 and R4 are mutually exclusive but highly intermingled, so these regions are grouped together into a single zone (EBr2r4). R3 has the most neurons among the ring neuron subtypes and is further grouped into five subclasses. While each subclass projects to a distinct part of the EB, the innervation patterns of the subclasses R3a and R3m, and also R3p and R3w, are very similar to each other. The region innervated by R3 is, therefore, subdivided into three zones, including EBr3am, EBr3pm, and EBr3d. Along with the other three zones, EBr1, EBr5, and EBr6, the entire EB is subdivided into seven non-overlapping zones (Figure 8G). Unlike other zones, EBr6 is innervated only sparsely by the R6 cells, and the space mainly filled by synaptic terminals of other neuron types, including the extrinsic ring neurons (ExR). Omoto et al.(Omoto et al., 2017) segmented the EB into five domains (EBa, EBoc, EBop, EBic, EBip) by the immunolabeling pattern of DN-cadherin, and each type of the ring neurons may innervate more than one domain in the EB. Our results show that the innervation pattern of each ring neuron subtype is highly compartmentalized at the EM level and the entire neuropil can be sufficiently subdivided into zones based purely on the neuronal morphologies. The neuropil may be subdivided differently if other neuron types, such as the extrinsic ring neurons (ExR)(Omoto et al., 2018), are recruited as landmarks.
Quality of the brain region boundaries
Since many of the terra incognita neuropils are not clearly partitioned from each other by solid boundaries such as glial walls, it is important to evaluate if the current boundaries reflect anatomical and functional compartments of the brain. We first measured the relative sizes of the boundaries between any two adjacent neuropil regions (Figure 9A). The map shows results for brain regions that are over 75% in the hemibrain region, restricted to right regions with exception to the asymmetric AB(L). For these regions, we counted the number of wire crossings by large traced neurons and estimated a cost. A bigger dot indicates a higher cost or a less clean boundary. We do not penalize neurons that cross a boundary once, but rather penalize when a neuron crosses the same boundary multiple times. By restricting our analysis to the right part of the hemibrain, we hopefully minimize the effect of smaller, traced-but-truncated neuron fragments on our score. Figure 9B shows the number of intersections normalized by the area of boundary. We spot checked many of the instances and in general note that the brain regions with a high cost, such as those in SNP, INP and VLNP, tend to have less well defined boundaries. In particular, the boundaries at SMP/CRE, CRE/LAL, SMP/SIP, and SIP/SLP have worse scores, indicating these boundaries may not reflect actual anatomical and functional segregation of the neuropils. These brain regions were defined based on the arborization patterns of characteristic neuron types, ut because neurons in the terra incognito neuropils tend to be rather heterogeneous, there are many other neuron types that do not follow these boundaries. The boundaries between the FB and AB also give relatively bad scores, and this suggests that the AB is tightly linked to the neighboring FB.
Quality check of the brain compartments. (A) The relative sizes of the boundaries between adjacent neuropils indicated in a log scale. (B) The number of neuronal intersections normalized by the area of neuropil boundary.
Insights for a whole-brain remapping
The current brain regions based on Ito et al. (Ito et al., 2014) contain a number of arbitrary determinations of brain regions and their boundaries in the terra incognita neuropils. In this study, we tried to solidify the ambiguous boundaries as much as possible using the information from the reconstructed neurons. However, large parts of the left hemisphere and the subesophageal zone (SEZ) are missing from the hemibrain dataset, and neurons innervating these regions are not sufficiently reconstructed. This incompleteness of the dataset is the main reason that we did not alter the previous map drastically and kept all the existing brain regions even if their anatomical and functional significance is not obvious. Once a complete EM volume of the whole fly brain is imaged and most of its 100,000 neurons are reconstructed, the entire brain can be re-segmented from scratch with more comprehensive anatomical information. Arbitrary or artificial neuropil boundaries will thereby be minimized, if not avoided, in a new brain map. Anatomy-based neuron segmentation strategies such as NBLAST may be used as neutral methods to revise the neuropils and their boundaries. Any single method, however, is not likely to produce consistent boundaries throughout the brain, especially in the terra incognita regions. It may be necessary to use different methods and criteria to segment the entire brain into reasonable brain regions. Such a new map would need discussion in a working group, and approval from the community in advance (as did the previous map(Ito et al., 2014)), insofar as it would replace the current map and therefore require a major revision of the neuron mapping scheme.
Cell Type Classification
Defining cell types for groups of similar neurons is a time-honored means to attempt to understand the anatomical and functional properties of a circuit. Presumably, neurons of the same type execute similar circuit roles. However, the definition of what is a distinct cell type and the exact delineation between one cell type and another is inherently vague and represents a classic taxonomic challenge, pitting ‘lumpers’ vs ‘splitters’. Despite our best efforts, we recognize that our typing of cells is not exact, and expect future revisions to cell type classification.
One common method of cell type classification, used in flies, exploits the GAL4 system to highlight the morphology of neurons having similar gene expression(Jenett et al., 2012). Since these genetic lines are imaged using fluorescence and confocal microscopy, we refer to them as ‘light lines’. Where they exist and are sufficiently sparse, light lines provide a key method for identifying types by grouping morphologically similar neurons together. However, there are several limitations. There are no guarantees of coverage, and it is sometimes difficult to distinguish between neurons of very similar morphology but different connectivity.
We enhanced the classic view of morphologically distinct cell types by defining distinct cell types (or sub-cell types) based on morphology and connectivity. Connectivity-based clustering often serves a clear arbiter of cell type distinctions, even when genetic markers have yet to be found, or when the morphology of different types is quite similar, sometimes sufficiently similar to be indistinguishable in optical images. For example, the two PEN (protocerebral bridge - ellipsoid body - noduli) neurons have very similar forms but quite distinct inputs (Figure 10)(Turner-Evans et al., 2019) Confirming their differences, PEN1 and PEN2 neurons, in fact, have been shown to have different functional activity(Green et al., 2017).
An example of two neurons with very similar shapes but differing connectivities.
Workflow for defining cell types
Based on our previous definition of cell type, many neurons exhibit a unique morphology or connectivity pattern at least within one hemisphere of the brain (presumably with a matching type in the other hemisphere). Therefore, in our hemibrain reconstruction, many neuron types consisting of a distinct morphology and connectivity have only a single example. It is possible in principle to provide coarser groupings of neurons. For instance, most cell types are grouped by their cell body fiber representing a distinct clonal unit, which we discuss in more detail below. Furthermore, each neuron can be grouped with neurons that innervate similar brain regions. In this paper, we do not explicitly formalize this higher-level grouping, but data on the innervating brain regions can be readily mined from the dataset.
Methodology for assigning cell types and nomenclature
Assigning names and types to the more than 20,000 reconstructed cells was a difficult and contentious undertaking. Many of the neurons have no previously annotated type. Adding to the complexity, prior work focused on morphological similarities and differences, but here we have, for the first time, connectivity information to assist in cell typing as well.
Most cell types for the visual projection neurons (VPNs), mushroom body (MB) neurons and central complex (CX) neurons are already described in the literature, but the existing names can be both inconsistent and ambiguous. The same cell type is often given differing names in different publications, and conversely, the same name, such as PN for projection neuron, is used for many different cell types. Nonetheless, for cell types already named in the literature (which we designate as famous cell types), we have tried to use an existing name. We apologize in advance for any offense given by our selection of names.
Overall, we defined a ‘type’ of neurons as either a single cell or a group of cells that have a very similar cell body location, morphology, and pattern of synaptic connectivity. We found 18,478 neuronal cell bodies in the hemibrain volume, most of which are located in the right side of the brain.
We classified these neurons in a few steps. The first step classified all cells by their lineage, grouping neurons according to their bundle of cell body fibers (CBFs). Neuronal cell bodies are located in the cell body layer that surrounds the brain, and each neuron projects a single CBF towards a synaptic neuropil. In the central brain, cell bodies of clonally related neurons deriving from a single stem cell tend to form clusters, from each of which arises one or several bundles of CBFs. We carefully examined the trajectory and origins of CBFs of the 15,532 neurons on the right central brain and identified 192 distinct CBF bundles. Among them, 154 matched the CBF bundles of 102 known clonal units(Ito et al., 2013)(Lin et al., 2013). The rest are minor populations and most likely of embryonic origin.
Different stem cells sometimes give rise to neurons with very similar morphologies. We classified these as different types because of their distinct developmental origin and slightly different locations of their cell bodies and CBFs. Thus, the next step in neuron typing was to cluster neurons within each CBF group. This process consisted of three further steps. First, we used NBLAST(Costa et al., 2016) to subject all the neurons of a particular CBF group to morphology-based clustering. Next, we used CBLAST, a new tool to cluster neurons based on synaptic connectivity (see below). This step is an iterative process, using neuron morphology as a template, to regroup neurons after more careful examination of neuron projection patterns and their connections. Finally, we validated the cell typing with extensive manual review and visual inspection. This review both allowed us to confirm cell type identity and help ensure neuron reconstruction accuracy.
In the hemibrain, using the defined brain regions and reference to known expression driver strains, we were able to assign a cell type to many cells. Where possible, we matched previously defined cell types with those labeled in light data using a combination of Neuprint, an interactive analysis tool (described below), and human recognition to find the matching cell types, especially in well explored neuropils such as the mushroom body (MB) and central complex (CX), where abundant cell type information was already available and where we are more confident in our anatomical expertise. Even though most of the cell types in the MB and CX were already described, we still found new cell types in these regions, an important vindication of our methods. In these cases we tried to name them using the existing schemes for these regions, and further refined these morphological groupings with relevant information on connectivity.
Outside the heavily studied regions, the fly’s circuits are largely composed of cells of unknown type. In this case putative type names were derived from a) the CBF group, b) the morphological type, and c) the connectivity type.
- Each of the 192 CBF bundles was given an ID according to the location of the cell body cluster (split into eight sectors of the brain surface with the combination of Anterior/Posterior, Ventral/Dorsal, and Medial/Lateral) and a number within the sector given according to the size of cell population. Thus, a CBF group might be named ADM01, meaning a group with the largest number of neurons in the Anterior Dorsal Medial sector of the brain’s surface.
- Morphological types were represented by the CBF group name followed by 1-3 lowercase letters, e.g. ADM01a.
- If neurons of near-identical morphology could be further subdivided into different connectivity types, they were suffixed with an underscore and a lowercase letter, e.g. ADM01a_b.
Finally, a suffix ‘_pct ‘, for putative cell type, was added. Thus, a full putative type name might be ‘ADM01a_pct’ if all the neurons of this type shared similar connectivity patterns, or ‘ADM01b_a pct’ and ‘AMD01b_b_pct’ if there are different connectivity types within neurs having a similar form. The resulting names may lack elegance, but the process is systematic and scalable.
The assignment of type names to neurons is still ongoing, and we expect the names of putative cell types will be refined by the research community, including simpler names that are easier to pronounce, as new information emerges. What will not change are the unique body ID numbers given in the database that refer to a particular (traced) cell in this particular image dataset. We strongly advise that such IDs be included in any publications based on our data to avoid confusion as cell type names (and possibly instance names) evolve.
CBLAST
As part of our effort to assign cell types, we built a tool for cell type clustering based on neuron connectivity, called CBLAST (by analogy with the existing NBLAST(Costa et al., 2016), which forms clusters based on the shapes of neurons). The tool is described in more detail in Figure 12.
Overview of the operation of CBLAST
Partitioning a network into clusters of nodes that exhibit similar connectivity is known as community detection or graph clustering(Fortunato and Hric, 2016). Numerous methods have been proposed for selecting such partitions, the best known being the stochastic block model. To non-theoreticians, the process by which most methods choose a partitioning is not intuitive, and the results are not easily interpretable. Furthermore, most approaches do not readily permit a domain expert to guide the partitioning based on her intuition or on other features of the nodes that are not evident in the network structure itself. In contrast, CBLAST is based on traditional data clustering concepts, leading to more intuitive results. Additionally, a user can apply their domain expertise by manually refining the partitioning during successive iterations of the procedure. This is especially useful in the case of a network like ours, in which noise and missing data make it difficult to rely solely on connectivity to find a good partitioning automatically. Additionally, other graph clustering methods do not accommodate the notion of left-right symmetry amongst communities, a feature that is critical for assigning cell types in a connectome.
CBLAST clusters neurons together using a similarity feature score defined by how the neuron distributes inputs and outputs to different neuron types. However, this is a circular requirement since neuron types must already be defined to use this technique. CBLAST therefore uses an iterative approach, refining cell type definitions successively. Initial cell type groups are putatively defined using an initial set of features based on morphological overlap as in NBLAST and/or based on the distribution of inputs and outputs in defined brain regions. These initial groups are fed into CBLAST in which the user can visualize and analyze the results using plots such as that in Figure 13. Given the straightforward similarity measure, the user can look at the input and output connections for each neuron to better understand the decision made by the clustering algorithm. As the definitions of cell type definitions are improved, the clustering becomes more reliable. In some cases, this readily exposes incompleteness (e.g., due to the boundary of the hemibrain sample) in some neurons which would complicate clustering even for more computationally intensive strategies such as a stochastic block model. Based on these interactions, the user makes decisions and refines the clusters manually, iterating until further changes are not observed.
Cells of five types plotted according to their connectivities. Coordinates are in arbitrary units after dimensionality reduction using UMAP(McInnes et al., 2018). The results largely agree with those from morphological clustering but in some cases show separation even between closely related types.
Our large, dense connectome is a key requirement for CBLAST. Unless a significant fraction of a neuron’s inputs and outputs is known, neurons that are in fact similar may not cluster together correctly. This requirement is not absolute, as we note that CBLAST is often able to match left and right symmetric neurons, despite some of these left side neurons being truncated by the boundaries of the dataset. Nonetheless, reconstruction incompleteness and any noise in the reconstruction can contribute to noise in clustering results.
CBLAST usually generates clusters that are consistent with the morphological groupings of the neurons, with CBLAST often suggesting new sub-groupings as intended. This agreement serves as some validation of the concepts behind CBLAST. In some cases it can be preferable to NBLAST, since the algorithm is less sensitive to exact neuron location, and for many applications the connectivity is more important than the morphology. In Figure 13, we show the results of using CBLAST on a few neuron types extracted from the ellipsoid body. The clusters are consistent with the morphology, with exception to a new sub-grouping for R3p being suggested as a more distinct group than type ExR7/ExR6.
Results of cell typing
Using the above semi-automated procedures, we identified 55 types for VPNs, 159 types in the antennal lobe (AL), 68 types in MB, and 264 types in CX, which in aggregate apply to a total of 10,734 neurons (note that cells in CX are counted for both right and left sides) (Table 2). For the remaining ≈10,000 neurons in the other brain regions, over 4000 cell types were identified. Over a thousand of these are types with only a single instance, although presumably, for a whole brain reconstruction, most of these types would have partners on the opposite side of the brain. Figure 14 shows the number of distinct neuron types found in different brain regions. Figure 15 shows the distribution of the number of neurons in each cell type.
The number of cell types in each major brain region. The sum of cell types in the graph is larger than the total number of cell types, because a single cell type may contribute to many regions.
Histogram showing the number of cell types with a given number of constituent cells.
Assessing Morphologies and Cell Types
Verifying correctness and completeness in these data is a challenging problem because no existing full brain connectome exists against which our data might be compared. We devised a number of tests to check the main features: Are the morphologies correct? Are the regions and cell types correctly defined? Are the synaptic connection counts representative?
Assessing completeness is much easier than assessing correctness. Since the reconstruction is dense, we believe the census of cells, types, and regions should be essentially complete. The main arbors of every cell within the volume are reconstructed, and almost every cell is assigned to at least a putative cell type. Similarly, since the identified brain regions nearly tile the entire brain, these are complete as well.
For checking morphologies, we searched for major missing or erroneous branches using a number of heuristics. Each neuron was reviewed by multiple proofreaders. The morphology of each neuron was compared with light microscopy data whenever it was available. When more than one cell of a given type was available (either left and right hemisphere, or multiple cells of the same type in one hemisphere), a human examined and compared them. This helped us find missing or extra branches, and also served as a double check on the cell type assignment. In addition, since the reconstruction is dense, all sufficiently large “orphan” neurites were examined manually until they were determined to form part of a neuron, or they left the volume. To help validate the assigned cell types, proofreaders did pairwise checks of every neuron with types that had been similarly scored.
For subregions in which previous dense proofreading was available (such as the alpha lobes of the mushroom body) we compared the two connectomes. We were also helped by research groups using both sparse tracing in the full fly brain TEM dataset(Zheng et al., 2018), and our hemibrain connectome. They were happy to inform us of any inconsistencies. There are limits to this comparison, as the two samples being compared were of different ages and raised under different conditions, then prepared and imaged by different techniques, but this comparison would nevertheless have revealed any gross errors. Finally, we generated a ‘probabilistic connectome’ based on a different segmentation, and systematically visited regions where the two versions differed.
Assessing Synapse Accuracy
As discussed in the section on finding synapses, we evaluated both precision (the fraction of found synapses that are correct) and recall (fraction of true synapses that were correctly predicted) on sample cubes in each brain region. We also double checked by comparing our findings with a different, recently published, synapse detection algorithm(Buhmann et al., 2019).
As a final check, we also evaluated the end-to-end correctness of given connections between neurons for different cell types and across brain regions. Specifically, for each neuron, we sampled 25 upstream connections (T-bar located within the neuron) and 25 downstream connections (PSD located within the neuron), and checked whether the annotations were correct, meaning that the pre/post annotation was valid and assigned to the correct neuron.
In total, we examined 1735 traced neurons spanning 1518 unique cell types (therefore examining 43k upstream connections and 43k downstream connections). The histogram of synapse accuracy (end-to-end precision of predicted synapses) is given in Figure 16. Median precision for upstream connections, as well as for downstream connections, is 88%. Additionally, 90% of cell types have an accuracy of at least 70%. For the few worst cases, we manually refined the synapse predictions afterwords. We note that the worst outlier, having an upstream connection accuracy of 12%, is both a case involving few total connections (17 T-bars), and some ambiguity in the ground-truth decisions (whether the annotated location is an actual T-bar).
Connection precision of upstream and downstream partners for ≈ 1000 cell types.
We also evaluated single-connection pathways across each brain region. In the fly, functionally important connections are thought typically to have many synapses, with the possible exception of cases where many neurons of the same type synapse onto the same downstream partner.. However, the presence of connections represented by few synapses is also well known, even if the biological importance of these is less clear. Regardless, we wanted to ensure that even single connection pathways were mostly correct. We sampled over 5500 single-connection pathways, distributed across 57 brain regions. Mean synapse precision per brain region was 76.1%, suggesting that single-connection accuracy is consistent with overall synapse prediction accuracy.
We also undertook a preliminary evaluation of two-connection pathways (two synapses between a single pair of bodies). We sampled 100 such two-connection pathways within the FB. Overall synapse precision (over the 200 synapses) is 79%, consistent with the single-edge accuracy. Moreover, the results also suggest that synapse-level accuracy is largely uncorrelated with pathway/bodies, implying that the probability that both synapses in a two-connection pathway were incorrect is 4.4% (1 - 0.792), close to the observed empirical value of 3%. (Applying a x2 goodness of fit test with a null hypothesis of independence gives a p value of 0.7.)
Assessing connection completeness
A synapse in the fly’s brain consists of a presynaptic density (with a characteristic T-bar) and typically several postsynaptic partners (PSDs). The T-bars are contained in larger neurites, and most (>90%) of the T-bars in our dataset were contained in identified neurons. The postsynaptic densities are typically in smaller neurites, and it is these that are difficult for both machine and human to connect with certainty.
With current technology, tracing all fine branches in our EM images is impractical, so we sample among them (at completeness levels typically ranging from 20% to 85%) and trace as many as practical in the allotted time. The goal is to provide synapse counts that are representative, since completeness is beyond reach and largely superfluous. Provided the missing PSDs are independent (which we try to verify), then the overall circuit emerges even if a substantial fraction of the connections are missing. If a connection has a strength of 10, for example, then it will be found in the final circuit with more than 99.9% probability, provided at least half the individual synapses are traced.
If unconnected small twigs are the main source of uncertainty in our data (as we believe to be the case), then as proofreading proceeds existing connections should only get stronger. Of course corrections resulting in lower connection strength, such as correcting a false connection or removing an incorrect synapse, are also possible, but are considerably less likely. To see if our proofreading process worked as expected, we took a region that had been read to a lower percentage completion and then spent the manual effort to reach a higher percentage, and compared the two circuits. (A versioned database such as DVID is enormously helpful here.) If our efforts were successful, ideally what we see is that almost all connections that changed got stronger, very few connections got weaker, and no new strong connections appeared (since all strong connections should already be present even in low coverage proofreading). If this is the behavior we find, we could be reasonably certain that the circuits found are representative for all strong connections.
Figure 17 below shows such an analysis. The results support our view that the circuits we report reflect what would be observed if we extrapolated to assign all pre- and postsynaptic elements.
Difference between connection strengths in the Ellipsoid Body with increased completeness in proofreading. Roughly 40,000 paths are shown. Almost all points fall above the line Y=X, showing that almost all paths increased in strength, with very few decreasing. In particular, no path decreased in strength by more than 5 synapses. Only two new strong (strength > 10) paths were found that were not present in the original. This should be rarer at higher levels of proofreading since neuron fragments (orphans) are added in order of decreasing size (see text).
Interpreting the connection counts
Given the complexity of the reconstruction process, and the many different errors that could occur, how confident should the user be that the returned synapse counts are valid? This section gives a quick guide in the absence of detailed investigation. The number of synapses we return is the number we found. The true number could range from slightly less, largely due to false synapse predictions, to considerably more, in the regions with low percentage reconstructed. For connections known to be in a specific brain region, the reciprocal of the completion percentage (as shown in Table 1) gives a reasonable estimate of the undercount.
If we return a count of 0 (the neurons are not connected), there are two cases. If the neurons do not share any brain regions, then the lack of connections is real. If they do share a brain region or regions, then a count of 0 is suspect. It is possible that there might be a weak connection (count 1-2) and less likely there is a connection of medium strength(3-9 synapses). Strong connections can be confidently ruled out, minus the small chance of a mis- or un-assigned branch with many synapses. If we report a weak connection (1-2 synapses), then the true strength might range from 0 (the connection does not exist) through a weak connection (3-9 synapses). If your model or analysis relies on the strength of these weak connections, it is a good idea to manually check our reconstruction. If your analysis does not depend on knowledge of weak connections, we recommend ignoring connections based on 3 or fewer synapses.
If we report a medium strength connection (3-9 synapses) then the connection is real. The true strength could range from weak to the lower end of a strong connection.
If we report a strong connection (10 or more synapses), the connection not only exists, but is strong. It may well be considerably stronger than we report.
Data Representation
The representation of connectomics data is a significant problem for all connectomics efforts. The raw image data on which our connectome is based is larger than 20 TB, and takes 2 full days to download even at a rate of 1 gigabit/second. Looking forward, this problem will only get worse. Recent similar projects are generating petabytes worth of data(Yin et al., 2019), and a mouse brain of 500 mm3, at a typical FIB-SEM resolution of 8nm isotropic, would require almost 1000 petabytes.
In contrast, most users of connectivity information want a far smaller amount of much more specific information. For example, a common query is ‘what neurons are downstream (or upstream) of a given target neuron?’. This question can be expressed in a few tens of characters, and the desired answer, the top few partners, fits on a single page of text.
Managing this wide range of data, from the raw gray-scale through the connectivity graph, requires a variety of technologies. An overview of the data representations we used to address these needs is shown in Figure 18. This organization offers several advantages. In most cases, instead of transferring files, the user submits queries for the portion of data desired. If the user needs only a subset of the data (as almost all users do) then they need not cope with the full size of the data set. Different versions of the data can be managed efficiently behind the scenes with a versioned database such as DVID(Katz and Plaza, 2019) that keeps track of changes and can deliver data corresponding to any previous version. The use of existing software infrastructure, such as Google buckets or the graph package neo4j, which are already optimized for large data, helps with both performance and ease of development. The advanced user is not limited to these interfaces - for those who may wish to validate or extend our results; we have provided procedures whereby the user can make personal copies of each representation, including the grayscale, the DVID data storage, and our editing and proofreading software. These allow other researchers to establish an entirely independent version of all we have done, completely under their control. Contact the authors for the details of how to copy all the underlying data and software.
Overview of data representations of our reconstruction. Circles are stored data representations, rectangles are application programs, ellipses represent users, and arrows indicate the direction of data flow labeled with transformation and/or format. Filled areas represent existing technologies and techniques; open areas were developed for the express purpose of EM reconstruction of large circuits.
What are the data types?
Grayscale data correspond to traditional electron microscope images. This is written only once, after alignment, but often read, because it is required for segmentation, synapse finding, and proofreading. We store the grayscale data, 8 bits per voxel, in Google buckets, which facilitates access from geographically distributed sites.
Segmentation, synapses, and identifying regions annotate and give biological meaning to the grayscale data. For segmentation, we assign a 64 bit neuron ID to each voxel. Despite the larger size per voxel (64 vs 8 bits) compared with the grayscale, the storage required is much smaller (by a factor of more than 20) since segmentation compresses well. Although the voxel level segmentation is not needed for connectivity queries, it may be useful for tasks such as computing areas and cross-sections at the full resolution available, or calculating the distance between a feature and the boundary.
Synapses are stored as point annotations - one point for a presynaptic T-bar, and one point for each of its postsynaptic densities (or PSDs). The segmentation can then be consulted to find the identity of the neurons containing their connecting synapses.
The compartment map of the brain is stored as a volume specified at a lower resolution, typically a 32×32×32 voxel grid. At 8nm voxels, this gives a 256 nm resolution for brain regions, comparable to the resolution of confocal laser scanning microscopy.
Unlike the grayscale data, segmentation, synapses, and regions are all modified during proof-reading. This requires a representation that must cope with many users modifying the data simultaneously, log all changes, and be versioned. We use DVID(Katz and Plaza, 2019), developed internally, to meet these requirements.
Neuron skeletons are computed from the segmentation(Zhao and Plaza, 2014), and not entered or edited directly. A skeleton representation describes each neuron with (branching) centerlines and diameters, typically in the SWC format popularized by the simulator Neuron(Carnevale and Hines, 2006). These are necessarily approximations, since it normally not possible (for example) to match both the cross sectional area and the surface area of each point along a neurite with such a representation. But SWC skeletons are a good representation for human viewing, adequate for automatic morphology classification, and serve as input to neural simulations such as Neuron. SWC files are also well accepted as an interchange format, used by projects such as NeuroMorpho(Ascoli et al., 2007) and FlyBrain(Shinomiya et al., 2011).
The connectivity graph is also derived from the data and is yet more abstract, describing only the identity of neurons and a summary of how they connect - for example, Neuron ID1 connects to neuron ID2 through a certain number of synapses. In our case it also retains the brain region information and the location of each synapse. Such a connectivity graph is both smaller and faster than the geometric data, but sufficient for most queries of interest to biologists, such as finding the upstream or downstream partners of a neuron. A simple connectivity graph is often desired by theorists, particularly within brain regions, or when considering neural circuits in which each neuron can be represented as a single node.
A final, even more abstract form is the adjacency matrix: This compresses the connectivity between each pair of neurons to a single number. Even this most economical form requires careful treatment in connectomics. As our brain sample contains more than 25K traced neurons as well as many unconnected fragments, the adjacency matrix has more than a billion entries (most of which are zero). Sparse matrix techniques, which report only the non-zero coefficients, are necessary for practical use of such matrices.
Accessing the data
For the hemibrain project we provide access to the data through a combination of a software interface(Clements et al., 2020) and a server (https://neuprint.janelia.org). Data are available in the form of gray-scale, pixel-level segmentation, skeletons, and a graph representation. Two previous connectomics efforts are available as well (a 7-column optic lobe reconstruction(Takemura et al., 2015) and the alpha lobe of the mushroom body(Takemura et al., 2017)). These can be found at https://neuprint-examples.janelia.org.
The most straightforward way to access the hemibrain data is through the Neuprint(Clements et al., 2020) interactive browser. This is a web-based application that is intended to be usable by biologists with minimal or no training. It allows the selection of neurons by name, type, or brain region, displays neurons, their partners, and the synapses between these in a variety of forms, and provides many of the graphs and summary statistics that users commonly want.
Neuprint also supports queries from languages such as Python(Sanner et al., 1999) and R, as used by the neuroanatomy tool NatVerse(Manton et al., 2019). Various formats are supported, including SWC format for the skeletons. In particular, the graph data can be queried through an existing graph query language, Cypher(Francis et al., 2018), as seen in the example below. The schema for the graph data is shown in Figure 19.
Schema for the neo4j graph model of the hemibrain. Each neuron contains 0 or more SynapseSets, each of which contains one or more synapses. All the synapses in a SynapseSet connect the same two neurons. If the details of the synapses are not needed, the neuron to neuron weight can be obtained as a property on the “ConnectsTo” relation, as can the distribution of this weight acrosdifferent brain regions (the roiInfo).
MATCH (n:Neuron) - [c:ConnectsTo] -> (t:Neuron) WHERE t.type = ‘MBON18’ RETURN n.type, n.bodyId, c.weight ORDER BY c.weight DESCENDING
This query looks for all neurons that are presynaptic to any neuron of type ‘MBON18’. For each such neuron it returns the types and internal identities of the presynaptic neuron, and the count of synapses between them. The whole list is ordered in order of decreasing synapse count. This is just an illustration for a particular query that is quite common and supported in Neuprint without the need for any programming language.
Adjacency matrices, if needed, can be derived from the graph representation. We provide a small demonstration program that queries the API and generates such matrices, either with or without the brain regions. The two matrices themselves are available in gzipped Python format. For more information on accessing data and other hemibrain updates, please see https://www.janelia.org/project-teams/flyem/hemibrain.
Matching EM and light microscopy data
We registered the hemibrain EM data to the JRC2018 Drosophila template brain(Bogovic et al., 2018) using an automatic registration algorithm followed by manual correction. We began by using the automated T-bar predictions (described in section 2.3) to generate a T-bar density volume rendered at a resolution comparable to those from light microscopic images. This hemibrain synapse density volume was automatically registered to the template brain using ANTs(Avants et al., 2008), producing both a forward and inverse transform. The resulting registration was manually fine-tuned using BigWarp(Bogovic et al., 2016). The total transform is the composition of the ANTs and BigWarp transformations, and can be found at https://www.janelia.org/open-science/jrc-2018-brain-templates.
Given a particular neuron of interest, researchers can use these resources to identify GAL4 lines labeling that neuron. First the representation of the neuron must be spatially transformed into the template space that GAL4 driver line to which images have previously been registered. A mask based approach(Otsuna et al., 2018) enables a search for GAL4 driver line image databases for particular neurons. Skeletonizing hemibrain neurons can enable the enquirer to query GAL4 neuronal skeleton databases using NBLAST(Costa et al., 2016).
Longer term storage of data, and archival references
Historically, archival data from biology data have been expressed as files that are included with supplementary data. However, for connectivity data this practice has two main problems. First, the data are large, and hard to store. Journals, for example, typically limit supplemental data to a few 10s of megabytes. The data here are about 6 orders of magnitude larger. Second, connectome data are not static, during proofreading and even after initial publication. As proofreading proceeds, the data improve in their completeness and quality. The question then is how to refer to the data as they existed at some point in time, required for reproducibility of scientific results. If represented as files, this would require many copies, checkpointed at various times - the ‘as submitted’ version, the ‘as published’ version, the ‘current best version’, and so on.
We resolve this, at least for now, by hosting the data ourselves and making them available through query mechanisms. Underlying our connectome data is a versioned database (DVID) so it is technically possible to access every version of the data as it is revised. However, as it requires effort to host and format this data for the Neuprint browser and API, only selected versions (called named versions) are available by default from the website, starting with the initial version, which is ‘hemibrain:v1.0’ Although this is only version currently, when reproducibility is required, such as when referencing the data in a paper, it is still best to refer explicitly to the milestone versions by name (such as ‘hemibrain:v1.0’) because we expect a new milestone version every few months, at least at first. We will supply a DOI for each of these versions, and each is archived, can be viewed and queried through the web browser and APIs at any time, and will not change.
The goal of multiple versions is that later versions should be of higher quality. Towards this end we have implemented several systems for reporting errors so we can correct them. Users can add annotations in NeuroGlancer(Perlman, 2019), the application used in conjunction with Neuprint to view image data, where they believe there are such errors. To make this process easier, we provide a video explaining it. We will review these annotations and amend those that we agree are problems. Users can also contact us via email about problems they find.
Archival storage is an issue since, unlike genetic data, there is not yet an institutional repository for connectomics data and the data are too large for journals to archive. We pledge to keep our data available for at least the next 10 years.
Analysis
Of necessity, most previous analyses have concentrated on particular circuits, cell types, or brain regions with relevance to specific functions or behaviors. For example, a classic paper about motifs(Song et al., 2005) sampled the connections between one cell type (layer 5 pyramidal neurons) in one brain region (rat visual cortex), and found a number of non-random features, such as over-represented reciprocal connections and a log-normal strength distribution. However, it has never been clear which of these observations generalize to other cell types, other brain regions, and the brain as a whole. We are now in a position to make much stronger statements, ranging over all brain regions and cell types.
In addition, many analyses are best performed (or can only be performed) on dense connectomes. Type-wide observations depend on a complete census of that cell type, and depending on the observation, a complete census of upstream and downstream partners as well. Some analyses, such as null observations about motifs (where certain motifs do not occur in all or portions of the fly’s brain) can only be undertaken on dense connectomes.
Compartment statistics
One analysis enabled by a dense whole-brain reconstruction involves the comparison between the circuit architectures of different brain areas within a single individual.
The compartments vary considerably. Table 3 shows the connectivity statistics of compartments that are completely contained within the volume, have at least 100 neurons, and have the largest or smallest value of various statistics. Across regions, the number of neurons varies by a factor of 74, the average number of partners of each neuron by a factor of 36, the network diameter by a factor of 4, the average strength of connection between partner neurons by a factor of 5, and the fraction of reciprocal connections by a factor of 5. The average graph distance between neurons is more conserved, differing by a factor of only 2.
Paths in the fly brain are short
Neurons in the fly brain are tightly interconnected, as shown in Figure 20, which plots what fraction of neuron pairs are connected as a function of the number of interneurons between them. Three quarters of all possible pairs are connected by a path with fewer than three interneurons, even when only connections with 5 synapses are included. If weaker connections are allowed, the paths become shorter yet. These short paths and tight coupling are very different from human designed systems, which have much longer path lengths connecting node pairs. As an example, a standard electrical engineering benchmark (S38584 from (Brglez et al., 1989)) is shown alongside the hemibrain data in Figure 20A-B. The connection graph for this example has roughly the same number of nodes as the graph of the fly brain, but pair-to-pair connections involve paths more than an order of magnitude longer – a typical node pair is separated by 60 intervening nodes. This is because a typical computational element in a human designed circuit (a gate) connects only to a few other elements, whereas a typical neuron receives input from, and sends outputs to, hundreds of other neurons.
Plots of the percentage of pairs connected (of all possible) versus the number of interneurons required. (a) shows the data from the whole hemibrain, for up to 8 interneurons. (b) is a much wider view of the same data, shown on a log scale so the curve from a human designed system is visible.
Distribution of connection strength
The distribution of connection strengths has been studied in mammalian tissue, looking at specific cell types in specific brain areas. These findings, such as the log-normal distribution of connection strengths in rat cortex, do not appear to generalize to flies. Assuming the strength of a connection is proportional to the number of synapses in parallel, we can plot the distribution of connection strengths, summing over the whole central brain, as shown in Figure 21. We find a nearly pure power law with an exponential cutoff, very different from the log-normal distribution of strengths found by Song(Song et al., 2005) in pyramidal cells in the rat cortex, or the bimodal distribution found for pyramidal cells in the mouse by Dorkenwald(Dorkenwald et al., 2019). However, we caution that these analyses are not strictly comparable. Even aside from the very different species examined, the three analyses differ. Both Song and Dorkenwald looked at only one cell type, with excitatory connections only, but one looked at electrical strength while the other looked at synapse area as a proxy for strength. In our analysis, we use synapse count as a proxy for connection strength, and look at all cell types, including both excitatory and inhibitory synapses.
The number of connections with a given strength. Up to a strength of 100, this is well described by a power law (exponent -1.67) with exponential cutoff (at N=42).
Small Motifs
As mentioned earlier, there have been many studies of small motifs, usually involving limited circuits, cell types, and brain regions. We emphatically confirm some traditional findings, such as the over-representation of reciprocal connections. We observe this in all brain regions and among all cell types, confirming similar findings in the antennal lobe(Horne et al., 2018). This can now be assumed to be a general feature of the fly’s brain, and possibly all brains. In the fly, the incidence varies somewhat by compartment, however, as shown in Table 3.
Large motifs
We define a large motif as a graph structure that involves every cell of an abundant type (N 20). The most tightly bound motif is a clique, in which every cell of a given type is connected to every other cell of that type, with synapses in both directions. Such connections, as illustrated in Figure 22(a), are extremely unlikely in a random wiring model. Consider, for example, the clique of R4d_b cells found in the ellipsoid body, as shown in Table 4. In the ellipsoid body, two cells are connected with an average probability of 0.19. Therefore the odds of finding all 600 possible connections between R4d_b cells, assuming a random wiring model, is 0.19600 ≈ 10−432.
Large motifs searched for. Squares represent abundant types with at least 20 instances. Circles represent sparse types with at most two instances. Panel (a) shows a clique, where all possible connections are present. (b) shows bidirectional connections between a sparse type and all instances of an abundant type. (c) show unidiectional connections from all of an abundant type to a sparse type. (d) illustrates a cell type that does not form a clique overall, but does within each of two compartments.
In the fly’s brain, large cliques occur in only a few cases, as shown in Table 4. All true cliques are in the central complex, with a near-clique among the KCab-p cells of the mushroom body. The cells of type PFNa form an interesting case. There are 58 such cells, 29 on each side. They do not form a clique as shown in Figure 22(a), as there are few connections between the opposite sides. But within each side, the 29 cells on that aside form a clique, as shown in Figure 22(d). The cliques within the central complex, and their potential operation, are discussed in detail in a companion paper.
The next most tightly bound motifs are individual cells that connect both to and from all cells of a given type, but are themselves of a different type. This is illustrated in Figure 22(b). Such a motif is often speculated to be a gain or sparseness controlling circuit, where the single neuron reads the collective activation of a population and then controls their collective behavior. A well known example is the APL neuron in the mushroom body, which connects both to and from all the Kenyon cells, and is thought to regulate the sparseness of the Kenyon cell activation(Lin et al., 2014).
We search for this motif by looking at cells with few instances (one or two) connecting bidirectionally to almost all cells (at least 90%) of an abundant type (N >= 20). We find this motif in three regions of the brain – it is common in the CX (73 different cells overseeing 22 cell types), the optic lobe circuits (19 cells overseeing 14 types), and somewhat in the MB (12 types overseeing 9 types). Spreadsheets containing these cell types, who they connect to, and the numbers and strengths of their connections are found in the supplementary data. We only analyze the optical circuits here, since the mushroom body and central complex are the subjects of companion papers. We observe three variations on this motif - a single cell conected to all of a type (Figure 23(a), found 5 times), a single cell with bi-directional connections to many types (Figure 23(b), found once), and multiple cells all connected bidirectionally to a single type (Figure 23(c), found 3 times. We find one circuit that is a combination: There is one cell that connects bidirectionally to all the LC17 neurons, and then a higher order cell that connects bidirectionally to a larger set (LPLC1, LPLC2, LLP1, LPC1, and LC17). In this case these are all looming-sensitive cells and hence these circuits may regulate the features of the overall looming responses. It is tempting to speculate that the more complex structures of Figure 23 (b) and (c) arose from the simpler structures of (a) through cell type duplication followed by divergence, but the connectomes of many more related species will be needed before this argument could be made quantitative.
One to many motifs found in the optic circuits. Individual neurons are named by unboxed text. Cell type names, in boxes, represent cells with many instances, with the numeber of instances shown beneath. The arrows show the average synapse count of each connection type. (a) shows an example of the most common case. Here one cell, AVL19m, has bidirectional connections to all cells of type LC13. (b) shows a single cell with exhaustive connections to several types. (c) shows an alternative motif where several cells form these one-to-many connections. For clarity the cell names have been truncated, with the suffix _pct (for putative cell type) removed.
The least tightly bound large motif is a cell that connects either to or from (but not both) all cells of a given type, as shown in Figure 22(c). Examples include the mushroom body output neurons(Takemura et al., 2017). This is a very common motif, found in many regions. We find more than 500 examples of this in the fly’s brain.
Brain regions and electrical response
How does the compartmentalization of the fly brain affect neural computation? In a few cases this has been established. For example, the CT1 neuron performs largely independent computations in each branch(Meier and Borst, 2019), whereas estimates show that within the medulla, the delays within each neuron are likely not significant for single column optic lobe neurons, and hence the neurons likely perform only a single computation(Takemura et al., 2013). Similarly, compartments of PEN2 neurons in the protocerebral bridge have been shown to respond entirely differently from their compartments in the ellipsoid body(Green et al., 2017)(Turner-Evans et al., 2019).
Our detailed skeleton models allow us to construct electrical models of neurons. In particular, to look more generally at the issues of intra– vs inter–compartment delays and amplitudes, we can construct a linear passive model for each neuron. Our method is similar to that elsewhere(Segev et al., 1985), except that instead of using right cylinders, we represent each segment of the skeleton as a truncated cone. This is then used to derive the axonic resistance, the membrane resistance, and membrane capacitance for each segment. To analyze the effect of compartment structure on neuron operation, we inject the neuron at a postsynaptic density (input) with a signal corresponding to a typical synaptic input (1 nS conductance, 1 ms width, 0.1 ms rise time constant, 1 ms fall time constant, 60 mV reversal potential). We then compute the response at each of the T-bar sites (outputs). Since the synapses, both input and output, are annotated by the brain region that contains them, this allows us to calculate the amplitudes and delays from each synapse (or a sample of synapses) in each compartment to each output synapse in all other compartments.
In general, we find the ROI structure of the neuron is clearly reflected in the electrical response. Consider, for example, the EPG neuron (Figure 24(a)) with arbors in the ellipsoid body, the protocere-bral bridge, and the gall. Figure 25(a) shows the responses to synaptic input in the gall. Within the gall, the delays are very short, and the amplitude relatively high and variable, depending somewhat on the input and output synapse within the gall. From the gall to other regions the delays are longer (typically a few milliseconds) and the amplitudes much smaller and nearly constant, largely independent of the exact transmitting and receiving synapse. There is a very clean separation between the within-ROI and across-ROI delays and amplitudes, as shown in Figure 25(a). The same overall behavior is true for inputs into the other regions - short delays and strong responses within the ROI, with longer delays and smaller amplitudes to other compartments.
(a) An EPG neuron, with arbors in three compartments. (b) Two neurons that connect in more than one ROI, in this case the calyx and the lateral horn. They are each pre- and postsynaptic to each other in both compartments.
(a) The linear response to inputs in the gall(GA) for an EPG neuron, which also has arbors in the ellipsoid body(EB) and the protocerebral bridge (PB). Each point in the modeled plot shows the time each response reached its peak amplitude (the delay), and the amplitude at that time, for an input injected at one of the PSDs in the Gall. (b) Delays and amplitudes for gall to PB response, for all combinations of three values of cytoplasmic resistance RA and three values of membrane resistance RM.
This simple pattern motivates a model that describes delays and amplitudes not as a single number, but as NxN matrix, where N is the number of ROIs. Each row contains the estimated amplitude and delay, measured in each compartment, for a synaptic input in the given compartment. This gives a much improved estimate of the linear response. For the example EPG neuron above, with nominal values for Ra, Rm, and Cm, if we represent all delays by a single number then the standard deviation of the error is 0.446 ms. If instead we represent the delays as a 3×3 matrix indexed by the compartment, the average error is 0.045 ms, for 10x greater accuracy. Similarly, the average error in amplitude drops from 0.168 mv to 0.021 mv, an eightfold improvement. While the improvement in error will depend on the neuron topology, in all cases it will be more accurate than a point model, for relatively little increase in complexity.
The absolute values of delay and amplitude are strongly dependent on the electrical parameters of the cell, however. A wide range of electrical properties have been reported in the fly literature (see Table 5) and it is plausible that these vary on a cell-to-cell basis. We therefore simulate with minimum, medium, and maximal values of Ra and Rm, for a total of 9 cases, as shown in Figure 25(b). All are needed since the resistance parameters interact non-linearly. We fix the value of Cm at 0.01 F/m2 since this value is determined by the membrane thickness and is not expected to vary from cell to cell(Kandel et al., 2000). The results over the parameter range are shown in Figure 25(b) for the case of the EPG neuron above for delay from the gall to the PB. The intra-ROI and between-ROI values are well separated for any value of the parameters (not shown).
Programs that deduce synaptic strength and sign by fitting a computed response to a connectome and measured electrical or calucium imagindg data(Tschopp et al., 2018) may at some point require estimates of the delays within cells. If this is required, the above results suggest this could be accomplished with reasonable accuracy with a ROI-to-ROI delay table and 2 additional parameters per neuron, RA and RM. This is relatively few new parameters in addition to the many synaptic strengths already fitted.
A number of neurons have parallel connections in separate ROIs (see Figure 24(b)). This motif is common in the fly’s brain – about 5% of all connections having a strength 6 are spread across two or more non-adjacent ROIs. Given the increased delays and lower amplitudes of cross-compartment responses, this type of interaction differs electrically from those in which all connections are contained in a single ROI. A point neuron model cannot generate an accurate response for such connections – a synapse in region A will result in a fast response in A and a slower, smaller response in B, and vice versa, even though both of these events involve communication between the same two neurons. It is not known if this configuration has a significant influence on the neurons’ operation.
From these models we conclude (a) the compartment structure of the fly brain shows up directly in the electrical response of the neurons. (b) the compartment structure, though defined anatomically, matches that of the electrical response. From the clear separation in Figure 25, it is likely that the same compartment definitions could be found starting with the electrical response, though we have not tried this. (c) These results suggest a low dimensional model for neural operation, at least in the linear region. A small region-to-region matrix can represent the delays and amplitudes well. (d) Absolute delays depend strongly (but in a very predicable manner) on the values of axial and membrane resistance, which can vary both from animal to animal and from cell to cell. (e) Neurons that have parallel connections in separate ROIs have a different electrical response than they would have with the same total number of synapses in a single ROI.
Rent’s rule analysis
Rent’s rule(Lanzerotti et al., 2005) is an empirical observation that in human designed computing systems, when the system is packed as tightly as possible, at every level of the hierarchy the required communication (the number of pins) scales as a power law of the amount of contained computation, measured in gates. Rent’s rule is an observed relationship, not derived from underlying theory, and the relationship is not exact and still contains scatter. A biological equivalent might be the observation that brain size tends to vary as a power law of body size(Harvey and Krebs, 1990), across a wide range of species occupying very different ecological and behavioral niches. Rent’s rule is roughly true over many orders of magnitude in scale, and for almost every system in which it has been measured. Somewhat surprisingly, Rent’s rule applies almost independently of the function performed by the computation being performed, and at every level of a hierarchical system. It also applies whether the compactness criterion is minimization of communication (partitioning) or physical close packing.
Rent’s rule is expressed as
where a is a scale factor (typically in the range 1-4), and b is the ‘Rent exponent’ describing how the number of connections to the compartment varies as a function of the amount of computation performed in the compartment. The Rent exponent has a theoretical range of 0.0 to 1.0, where 0 represents a constant number of connections, with no dependence on the amount of computation performed, and 1.0 represents a circuit in which every computation is visible on a connection. Human designed computational systems occupy almost the full range, from spreadsheets in which every computation is visible, to largely serial systems in which minimizing communication (pins) is critical. This relationship is shown in Figure 26. However, when the overriding criterion is that the system must be packed as tightly as possible, Rent observed that the exponent of the power law falls in a close range of roughly 0.5-0.7.
Rent’s rule for the hemi-brain. The yellow region is the theoretical bounds for computation. Human systems designed for visibility into computation achieve the upper bound, while human designed systems designed for minimum communication approach the lower bounds (Microprocessors ST7LU55, LPC1102, and STM32). Human designed systems where efficient packing is the main criterion occupy the shaded area (in 2D and 3D). The hemi-brain compartments fall very nearly in the same range as human designed systems.
For electrical circuits, the computation is measured in gates, and the connections are measured by pin count. These ranges are shown in Figure 26 for circuits that are roughly the size of the fly’s brain, packed in either two(Yang et al., 2001) or three(Das et al., 2004) dimensions.
Also shown in this plot are the values for the fly’s brain computational regions. In this case, the computation is measured as the number of contained T-bars, and the connection count is the number of neurons that have at least one synapse both inside and outside the compartment. (Very similar results are obtained if the computation is measured as the number of PSDs, or the number of unique connection pairs). Almost all the fly brain compartments fall well within the range of exponents expected for packing-dominated systems, while the ellipsoid body (EB) falls just outside the expected area. This is perhaps due to the large number of clique-containing circuits in the ellipsoid body (see Table 4), since such circuits have few connections for the amount of synapses they contain.
Both human designed and biological systems have huge incentives to pack their computation as tightly as possible. A tighter packing of the same computation yields faster operation, lower energy consumption, less material cost, and lower mass. A natural speculation, therefore, is that both the human-designed and evolved systems are dominated by packing considerations, and that both have found similar solutions.
Conclusions and future work
In this work we have achieved a dream of anatomists that is more than a century old. For at least the central brain of at least one animal with a complex brain and sophisticated behavior, we have a complete census of all the neurons and all the cell types that constitute the brain, a definitive atlas of the regions in which they reside, and a graph representing how they are connected.
To achieve this, we have made improvements to every stage of the reconstruction process. Better means of sample preparation, imaging, alignment, segmentation, synapse finding, and proofreading are all summarized in this work and will form the basis of yet larger and faster reconstructions in the future.
We have provided the data for all the circuits of the central brain, at least as defined by nerve cells and chemical synapses. This includes not only circuits of regions that are already the subject of extensive study, but also a trove of circuits whose structure and function are yet unknown.
We have provided a public resource that should be a huge help to all who study fly neural circuits. Finding upstream and downstream partners, a task that until now has typically taken months of challenging experiments, is now replaced by a lookup on a publicly available web site. Detailed circuits, which used to require considerable patience, expertise, and expertise to acquire, are now available for the cost of an internet query.
More widely, a dense connectome is a valuable resource for all neuroscientists, enabling novel, system-wide analyses, as well as suggesting roles for specific pathways. A surprising revelation is the richness of anatomical synaptic engagements, which far exceeds pathways required to support identified fly behaviors, and suggests that most behaviors have yet to be identified.
Finally, we have started the process of analyzing the connectome, though much remains to be done. We have quantified the difference between computational compartments, determined that the distribution of strengths is different from that reported in mammals, discovered cliques and other structures and where these occur, examined the effect of compartmentalization on electrical properties, and provided evidence that the wiring of the brain is consistent with optimizing packing.
Many of the extensions of this work are obvious and already underway. Not all regions of the hemibrain have been read to the highest accuracy possible, insofar as we have concentrated first on the regions overlapping with other projects, such as the central complex and the mushroom body. We will continue to update other sections of the brain, and distributed circuits such as clock neurons that are not confined to one region, but spread throughout the brain.
There is much more to be learned about the graph properties of the brain, and how these relate to its function.
The two sexes of the Drosophila brain are known to differ(Auer and Benton, 2016). so that reconstructing a male fly is critical to compare the circuits of the two sexes. The ventral nerve cord (VNC) should be included since the circuits in the VNC are known to be crucial for fly motor behavior(Yellman et al., 1997). At least one optic lobe should be included to simplify analysis of visual inputs to the central brain. A whole brain connectome is preferable to the hemibrain, since then most cell types would have at least two examples, left and right, which would lend increased confidence to our reconstructions. It would also provide complete reconstruction to the many neurons that span the brain, especially the clock neurons, and are incomplete in the hemibrain. These three goals are combined in a project that is currently underway, to image and reconstruct an entire male central nervous system (CNS) including the VNC and optic lobes.
We continue to improve sample preparation, imaging, and reconstruction both to decrease the efforts expended on reconstruction and to speed reconstruction of more specimens. Improvements include multi-beam imaging, etching methods(Hayworth et al., 2019) that can handle larger areas, and yet better reconstruction techniques.
Reviewed Preprint
This preprint has been reviewed by eLife. Authors have responded but not yet submitted a revised edition
- Author response
- Mar 6, 2022
- Peer review
- Mar 3, 2022
- Preprint posted
- Nov 8, 2021
Figures
Editors and reviewers
-
Reviewing EditorEve MarderBrandeis University, United States
-
Senior EditorMichael B EisenUniversity of California, Berkeley, United States
-
Reviewer 1Jason PipkinBrandeis University, United States
-
Reviewer 2Anonymous
No reviews found
-
Eve MarderReviewing Editor; Brandeis University, United States
-
Michael B EisenSenior Editor; University of California, Berkeley, United States
-
Jason PipkinReviewer; Brandeis University, United States
-
Chris Q DoeReviewer; Howard Hughes Medical Institute, University of Oregon, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
We consider this work to be a tour de force achievement on several fronts. Technologically, it ties together nearly a decade of advances in sample preparation, imaging, data management, and image analysis. It also is a very complete automated reconstruction of an EM volume that allows the authors to carefully begin the process of labeling subregions of the neuropil, derive cell types on the basis of both structure and connectivity, and identify circuit motifs, and is a demonstration of what connectomics has always promised to deliver: a reference atlas for biologists and a springboard for theoreticians and modelers working anywhere between the single-cell and whole network levels. We anticipate that this paper and its tools will facilitate the work from numerous laboratories around the world.
Decision letter after peer review:
Thank you for submitting your article "A Connectome and Analysis of the Adult Drosophila Central Brain" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Eisen as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jason Pipkin (Reviewer #1) and Chris Q Doe (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
Summary:
This paper is viewed as a landmark contribution to the methodologies of EM connectomics and its use to characterize the Drosophila brain. The manuscript is extensive and well-illustrated, and the reviewers and editors are pleased to help to make this work available to the public. I am taking the unusual (for eLife) action to include the two reviewers in their entirety, as they include constructive comments that were intended by these two careful readers to make the paper more accessible and more useful for the community. I hope that you will take into consideration these comments, and make those editorial changes that will strengthen the paper. In particular, reviewer 2's major request for additional information seems critical for the paper to be maximally useful to the community,
Title: Reviewer 2 suggests a change in the title for your consideration.
Reviewer #1:
The work presented by Scheffer et al. here is a tour de force achievement on several fronts. Technologically, it ties together nearly a decade of advances in sample preparation, imaging, data management, and image analysis. Most impressively, this represents – to my knowledge – the densest and most complete automated reconstruction of an EM volume of this size. While at least one larger volume has been generated from the adult fly brain (Davi Bock's TEMCA work), it has not been segmented (yet) to the level of completion presented here. (Though I am curious to hear the authors' thoughts on to what extent the overall automated segmentation strategy used herein is truly dependent on the isotropic voxels or if a similar set of networks could be retrained on anisotropic data from other existing volumes. One can imagine the value in validating connectivity in another sample that's already been imaged.)
The completeness of the hemibrain connectome enables the authors to carefully begin the process of labeling subregions of the neuropil, derive cell types on the basis of both structure and connectivity, and identify circuit motifs. They also show that the segmented skeletons enable a first pass at building detailed neuronal models at the single-cell level. Therefore this work is not just the presentation of a volume of data (itself impressive) but also a demonstration of what connectomics has always promised to deliver: a reference atlas for biologists and a springboard for theoreticians and modelers working anywhere between the single-cell and whole network levels.
I have no major critiques of this manuscript. Some of the figures could be more striking – or at least not set to Matlab defaults in terms of colors and box ticks (Figures 17, 20, 21 and 25). Others are beautiful (Figures 8 and 10, e.g.).
Finally, I commend the authors for building out the online portal for others to interact with their data. This is an achievement on its own, and probably the most important one for yielding the greatest scientific returns from their efforts.
Reviewer #2:
This massive work describes new methods for generating EM data on large chunks of nervous system – 250 x 250 μm adult central brain – which includes all of one side of the bilateral brain plus all of the central brain midline structures such as the central complex. Thus, it has an n = 1 for most brain neurons. It excludes most of the optic lobe, and all of the ascending/descending neurons, SEZ and VNC. The paper contains comprehensive analyses of the data set, including motif structure, classifying cell types, and adjusting brain neuropil boundaries. The Neuprint software is elegant and intuitive.
Importantly, this data set and associated software provide a method to transition from a light level neuron morphology (e.g. from a FlyLight neuron to a Neuprint neuron). While this needs further development (see comment below), it has the potential to save years of experimental analysis to reach the same point.
This data set will be the gold standard until the full CNS reconstruction is finished in the future. The quality of the EM data are extremely high based on images shown and data in Neuroglancer. As mentioned above, this is a massive work in many regards.
My only required major comment is to expand the section "Matching EM and light microscopy data" as this is an extremely important advance, and perhaps one of the most useful aspects of the entire manuscript. I think the most useful improvement would be to give an example from beginning (FlyLight neuron) to end (matching neuron in Neuprint). This can be another figure, or perhaps better as a numbered text instructions with full URLs for each required step. Or a third option, provide an example workflow on a Janelia page and link to it here. As it stands, I was unable to perform this function with the available information in the paper.
https://doi.org/10.7554/eLife.57443.sa1